Robust, Light-weight Approaches to compute Lexical Similarity
نویسندگان
چکیده
Most text processing systems need to compare lexical units – words, entities, semantic concepts – with each other as a basic processing step within large and complex systems. A significant amount of research has taken place in formulating and evaluating multiple similarity metrics, primarily between words. Often, such techniques are resourceintensive or are applicable only to specific use cases. In this technical report, we summarize some of our research work in finding robust, lightweight approaches to compute similarity between two spans of text. We describe two new measures to compute similarity, WNSim for word similarity, and NESim for named entity similarity, which in our experience have been more useful than more standard similarity metrics. We also present a technique, Lexical Level Matching (LLM), to combine such token-level similarity measures to compute phraseand sentence-level similarity scores. We have found LLM to be useful in a number of NLP applications; it is easy to compute, and surprisingly robust to
منابع مشابه
A heuristic light robust approach to increase the quality of robust solutions
In this paper, the optimizations problems to seek robust solutions under uncertainty are considered. The light robust approach is one of the strong and new methods to achieve robust solutions under conditions of uncertainty. In this paper, we tried to improve the quality of the solutions obtained from the Light Robust method by introducing a revised approach. Considering the problem concerned, ...
متن کاملA Hybrid Distributional and Knowledge-based Model of Lexical Semantics
A range of approaches to the representation of lexical semantics have been explored within Computational Linguistics. Two of the most popular are distributional and knowledgebased models. This paper proposes hybrid models of lexical semantics that combine the advantages of these two approaches. Our models provide robust representations of synonymous words derived from WordNet. We also make use ...
متن کاملRobust and High Fidelity Mesh Denoising
This paper presents a simple and effective two stage mesh denoising algorithm, where in the first stage, the face normal filtering is done by using the bilateral normal filtering in the robust statistics framework. Tukey’s bi-weight function is used as similarity function in the bilateral weighting, which is a robust estimator and stops the diffusion at sharp edges which helps to retain feature...
متن کاملLexical Semantic Relatedness with Random Graph Walks
Many systems for tasks such as question answering, multi-document summarization, and information retrieval need robust numerical measures of lexical relatedness. Standard thesaurus-based measures of word pair similarity are based on only a single path between those words in the thesaurus graph. By contrast, we propose a new model of lexical semantic relatedness that incorporates information fro...
متن کاملRandom Walks for Text Semantic Similarity
Many tasks in NLP stand to benefit from robust measures of semantic similarity for units above the level of individual words. Rich semantic resources such as WordNet provide local semantic information at the lexical level. However, effectively combining this information to compute scores for phrases or sentences is an open problem. Our algorithm aggregates local relatedness information via a ra...
متن کامل